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THE MAXIMUM LIKELIHOOD DEGREE OF MIXTURES OF INDEPENDENCE 

MODELS 


JOSE ISRAEL RODRIGUEZ AND BOTONG WANG 


Abstract. The maximum likelihood degree (ML degree) measures the algebraic complexity of 
a fundamental optimization problem in statistics: maximum likelihood estimation. In this prob¬ 
lem, one maximizes the likelihood function over a statistical model. The ML degree of a model is 
an upper bound to the number of local extrema of the likelihood function and can be expressed 
as a weighted sum of Euler characteristics. The independence model (i.e. rank one matrices over 
the probability simplex) is well known to have an ML degree of one, meaning there is a unique 
local maximum of the likelihood function. However, for mixtures of independence models (i.e. 
rank two matrices over the probability simplex), it was an open question as to how the ML de¬ 
gree behaved. In this paper, we use Euler characteristics to prove an outstanding conjecture by 
Hauenstein, the first author, and Sturmfels; we give recursions and closed form expressions for 
the ML degree of mixtures of independence models. 


1. Introduction 


Maximum likelihood estimation is a fundamental computational task in statistics. A typical 
problem encountered in its applications is the occurrence of multiple local maxima. To be 
certain that a global maximum of the likelihood function has been achieved, one locates all 
solutions to a system of polynomial equations called likelihood equations; every local maxima 
is a solution to these equations. The number of solutions to these equations is called the 
maximum likelihood degree (ML degree) of the model. This degree was introduced in [3, 14] 
and gives a measure of the complexity to the global optimization problem, as it bounds the 
number of local maxima. 

The maximum likelihood degree has been studied in many contexts. Some of these contexts 
include Gaussian graphical models [25], variance component models [11], and in missing data 
[15]. In this manuscript, we work in the context of discrete random variables (for a recent sur¬ 
vey in this context, see [18]). In our main results, we provide closed form expressions for ML 
degrees of mixtures of independence models, which are sets of joint probability distributions 
for two random variables. This answers the outstanding Conjecture 4.1 in [13]. 

1.1. Algebraic statistics preliminaries. We consider a model for two discrete random vari¬ 
ables, having m and n states respectively. A joint probability distribution for two such random 
variables is written as an m x n-matrix: 


Pn Pi 2 
P2l P22 
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The (h/)th entry pij represents the probability that the first variable is in state i and the second 
variable is in state j. By a statistical model, we mean a subset Af of the probability sim¬ 
plex Amn-i of all such matrices P. The d — 1 dimensional probability simplex is defined as 
:= e ]R|o suchthat = 1}. 

If i.i.d. samples are drawn from some distribution P, then the data is summarized by the 
following matrix (2). The entries of u are non-negafive integers where M,y is the number of 
samples drawn with state (/,/): 


( 2 ) 


Mil “12 ■ ■ ■ “in 

“21 “22 ■ ■ ■ “ 2 n 

Uml ^m2 ’ ’ ’ ^mn 


The likelihood function corresponding to the data matrix u is given by 

(3) 4(p):=P?rP^f •••p“7- 

Maximum likelihood estimation is an optimization problem for the likelihood function. This 
problem consists of determining, for fixed w, the argmax of f'ii(p) on a statistical model M. 
The optimal solution is called the maximum likelihood estimate (mle) and is used to measure 
the true probability distribution. For the models we consider, the mle is a solution to the 
likelihood equations. In other words, by solving the likelihood equations, we solve the max¬ 
imum likelihood estimation problem. Since the ML degree is the number of solutions to the 
likelihood equations, it gives a measure on the difficulty of the problem. 

The model Mmn in Amn-i is said to be the mixture of independence models and is defined to 
be the image of the following map: 

, . X X X X y Afmn 

(Ri,Bi,R2,B2,C) ^ CiRiB[ + C2R2BJ ' 

where Ri,Bj,C [ci,C 2 ]^ are ra x 1 matrices, n x 1, and 2x1 matrices respectively with 
positive entries that sum to one. The Zariski closure of Mmn yields the variety of rank at 
most 2 matrices over the complex numbers. We will determine the ML degree of the models 
Mmn by studying the topology of the Zariski closure. Prior to our work, the ML degrees of 
these models were only known for small values of m and n. In [13], the following table of ML 
degrees of Mmn were computed: 


n = 

3 

4 

5 

6 

7 

m = 3 : 

10 

26 

58 

122 

250 

ra = 4 : 

26 

191 

843 

3119 


ra = 5 : 

58 

843 

6776 




Mixtures of independence models appear in many places in science, statistics, and mathemat¬ 
ics. In computational biology, the case where (m, n) equals (4,4) is discussed in Example 1.3 of 
[21], and the data u consists of a pair of DNA sequences of length (wn -|- uu + • • • + Mmn)- The 
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m and n equal four because DNA molecules are composed of four nucleotides. Another inter¬ 
esting case for computational biology is when {m,n) = (20,20) because there are 20 essential 
amino acids [Remark 4.3]. 

Our first main result. Theorem 3.12, proves a formula for the first row of the table: 

MLdegreeAlsn = 2”^^ — 6 for n >3. 

Our second main result, is a recursive expression for the ML degree of mixtures of indepen¬ 
dence models. As a consequence, we are able to calculate a closed form expression for each 
row of the above table of ML degrees [Corollary 4.2]. 

Our techniques relate ML degrees to Euler characteristics. Previously, Huh has shown that 
the ML degree of a smooth algebraic statistical models M with Zariski closure X equals the 
signed topological Euler characteristic of an open subvariety X‘’, where X” is the set of points 
of X with nonzero coordinates and coordinate sums [17]. More recent work of Budur and 
the second author shows that the ML degree of a singular model is a stratified topological 
invariant. In [1], they show that the Euler characteristic of X° is a sum of ML degrees weighted 
by Euler obstructions. These Euler obstructions, in a sense, measure the multiplicity of the 
singular locus. 

We conclude this introduction with illustrating examples to set notation and definitions. 

1.2. Defining the maximum likelihood degree. We will use two notions of maximum likeli¬ 
hood degree. The first notion is from a computational algebraic geometry perspective, where 
we define the maximum likelihood degree for a projective variety. When this projective variety 
is contained in a hyperplane, the maximum likelihood degree has an interpretation related to 
statistics. The second notion is from a topological perspective, where we define the maximum 
likelihood degree for a very affine variety, a subvariety of an algebraic torus (C*)”. 

To we associate the coordinates po, pi,..., Pn, arid ps (were s stands for sum). Consider 
the distinguished hyperplane in defined by po + • • • + Pn ~ Ps = 0 (ps is the sum of the other 
coordinates). 

Let X be a generically reduced variety contained in the distinguished hyperplane of P”+i 
not contained in any coordinate hyperplane. We will be interested in the critical points of the 
likelihood function 

^u{p)-- p'qvT ■■■pi” vf 

where Us := —Uq — ••• — «„ and uo,---,u„ G C. The likelihood function has the nice property 
that, up to scaling, its gradient is a rational function V£u(p) ■— ^ : ••• : ^ : ^1. 

Definition 1.1. Let u be fixed. A point p G X is said to be a critical point of the likelihood function 
on X if p is a regular point of X, each coordinate of p is nonzero, and the gradient V£„(p) at p 
is orthogonal to the tangent space of X at p. 

Example 1.2. Eet X of P^ be defined by po + Pi + P2 + P3 — Ps and pops — pipi- For [uq : Mi : 
M2 : M3 : Ms] = [2 : 8 : 5 : 10 : —25] there is a unique critical point for iu{p) on X. This point is 
[po : Pi : p 2 : P3 : Ps] = [70 : 180 : 105 : 270 : —625]. Whenever the coordinates are nonzero, 
there is a unique critical point [(mq -|- mi)(mo -|- M2) : (mq -|- Mi)(mi -|- 113) : (112 -h M3 )(mo -|- M2) : 

(M 2 + M3)(mi -|- M3) : —(Mq -|- Ml -|- M2 -|- Ms)^]. 



4 


RODRIGUEZ AND WANG 


Definition 1.3. The maximum likelihood degree of X is defined to be the number of critical points 
of the likelihood function on X for general Uq, ... ,u„. The maximum likelihood degree of X is 
denoted MLdeg(X). 

We say u* in is general, whenever there exists a dense Zariski open set U for which 
the number of critical points of iu{p) is constant and u* G U. In Example 1.2, the Zariski open 
set U is the complement of the variety defined by (mq + Mi)(mo + M 2 )(mi + M 3 )(m 2 + M 3 )ms = 0. 
Determining this Zariski open set explicitly is often quite difficult, but it is not necessary when 
using reliable probabilistic algorithms to compute ML degrees as done in ?? for example. 
However, with our results, we compute ML degrees using Euler characteristics and topological 
arguments. 

1.3. Using Euler characteristics. Eirst, we recall that the Euler characteristic of a topological 
space X is defined as 

X(X) = X](-l)'dimH'(X,Q), 
i >0 

where the H' are the singular cohomology groups. 

In the definition of maximum likelihood degree of a projective variety X, a critical point 
p G X must have nonzero coordinates. This means all critical points of the likelihood function 
are contained in the underlying very affine variety of 

X° := X\{coordinate hyperplanes}. 

In fact, the ML degree is directly related to the Euler characteristic of smooth X°. 

Theorem 1.4 ([17]). Suppose X is a smooth projective variety o/P”+^. Then, 

(5) = (-1)‘^™^“ MLdegX. 

The next example will show how to determine the signed Euler characteristic of a very 
affine variety Y. Recall that the Euler characteristic is a homotopy invariant and satisfies the 
following properties. The Euler characteristic is additive for algebraic varieties. More precisely, 
X{X) = x(^) +x(^ \ -Z)/ where Z is a closed subvariety of X (see [8, Section 4.5] for example). 
The product property says x{^ x N) = x{^) ' X{N)- More generally, the fibration property says 
that if E ^ B is a fibration with fiber F then x{^) = xi^) ' xijB) (see [24, Section 9.3] for 
example). 

Example 1.5. Consider X from Example 1.2. The variety X has the parameterization shown 
below 

pi X pi ^ X 

[xo : xi] X [yo : i/i] ^ [xoyo : Xoyi : Xiyo : Xiyi : Xoyo + Xoyi + Xiyo + Xiyi)]. 

Let X° be the underlying very affine variety of X and consider (9 := pi\ {[0 : 1], [1 : 0], [1 : —1]} 
the projective space with 3 points removed. The very affine variety X” has a parameterization: 

0x0 ^ X" 

[xo : xi] X [yo : yi] ^ [xoyo : Xoyi : Xiyo : Xiyi, (xq + Xi)(yo + yi)]. 
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Since — 2, after removing 3 points, we have x{^) = “1- ^^7 the product property 

X{0 X C3) = 1, and hence x{^°) = 1- Because X” is smooth, by Hub's result, we conclude the 
ML degree of X" is 1 as well. 

We call X the variety of 2 x 2 matrices with rank 1 and restricting to the probability simplex 
yields the independence model. This example generalizes by considering the map x 

X given by ([xq : • • • : [yo : ■ ■ ■ ■. yn-i\) ^ [xqi/o : • • • : Xm-iyn-i ■ In 

this case, X is the variety oi m x n rank 1 matrices, and a similar computation shows that the 
ML degree is 1 in these cases as well (see Example 1 of [13]). 

2. Whitney stratification, Gaussian degree, and Euler obstruction 

As we have just seen, for a smoofh very affine variety, the ML degree is equal to the Euler 
characteristic up to a sign. This is not always the case, when the very affine variety is not 
smooth. When the variety is singular, the ML degree is related to the Euler characteristic in a 
subtle way involving Whitney stratifications. This subtlety is explained in Corollary 2.9 by con¬ 
sidering a weighted sum of the ML degrees of each strata. Here, we give a brief introduction 
to the topological notions of Whitney stratification and Euler obstruction. 

2.1. Whitney stratification. Many differential geometric notions do not behave well when a 
variety has singularities, for instance, tangent bundles and Poincare duality. This situation is 
addressed by stratifying the singular variety into finitely many pieces, such that along each 
piece the variety is close to a smooth variety. A naive way to stratify a variety X is taking the 
regular locus X^eg as the first stratum, and then take the regular part of fhe singular locus of X, 
i.e., (Xsing)reg, and repeat this procedure. This naive stratification does not always reflect the 
singular behavior of a variety as seen in the Whitney umbrella. Whitney introduced conditions 
(now called Whitney regular) on a stratification, where many differential geometric results can 
be generalized to singular varieties. A Whitney stratification satisfies the following technical 
condition of Whitney regular. The definition below is from [10, Page 37]. See also [16, E3.7]. 

Definition 2.1. Let X be an analytic subvariety of a complex manifold M. Let X = U/eA 
a stratification of X into finitely many locally closed submanifolds of M. This stratification is 
called a Whitney stratification if all the pairs (Sa, S^) with C are Whitney regular, which 
means the following. 

Suppose Xf C is a sequence of points converging to some point y G S^. 
Suppose y, C also converge to y, and suppose that (with respect to some local 
coordinate system on M) the secant lines = x,yi converge to some limiting line 
I, and the tangent planes Tx-S^ converge to some limiting plane t. Then / C t. 

Example 2.2. Here are some instances of Whitney stratifications. 

(1) [26, Theorem 19.2] Every complex algebraic and analytic variety admits a Whitney 
stratification. 

(2) If X is a smoofh variety, then the trivial stratification X itself is a Whitney stratification. 

(3) If X has isolated singularities at Pi,..., P/, then the stratification {Pi}U...U{P/}UX\ 
{Pi,..., P/} is a Whitney stratification. 
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(4) Since Whitney regular is a local condition, if U/eA a whitney stratification of an 
algebraic variety X, and if U is an open subset of X, then U;6 a(S; n LI) is a Whitney 
stratification of U. 

(5) If X is a smoofh algebraic variety, and Z C X is a closed smooth subvariety, then the 
pair {Z, X \ Z} is also a Whitney stratification of X. 

(6) If U/ei Si is a Whitney stratification of X and U/eJ ^ Whitney stratification of Y, 
then Uieije/ Si x Tj is a Whitney stratification of X x Y. 

Proposition 2.3. Let X be an algebraic variety. Suppose an algebraic group Q acts on X with only 
finitely many orbits Oi, ... ,Oi. Then Oi U ... U Oiform a Whitney stratification ofX. 

Proof. First of all, we need to show that each Oi is a locally closed smooth subvariety of X. 
Let / : ^ X X —X be the group action map. Then each orbit is of the form f(Q x {point}), 
and hence is a constructible subset of X. On the other hand, since Q acts transitively on each 
Oi, as constructible subsets of X, Oj are isomorphic fo smooth varieties. This implies that if 
dimOi = dimX, then Oj is open in X. Therefore, X \ UdimOi=dimX O; is a closed (possibly 
reducible) subvariety of X. Now, Q acts on X \ UdimO;=dimX Oi with finite orbits. Thus, we can 
use induction to conclude that each Oi is a locally closed smooth subvariety of X. 

Wifhouf loss of generality, we assume that Oi C O 2 , and we need to show that the pair 
0i,02 is Whitney regular. Moreover, by replacing X by O 2 , we can assume that O 2 is open 
in X. By [26, Lemma 19.3], there exists an open subvariety U of Oi such fhaf the pair U, O 2 
is Whitney regular. Given any T G the action t : O 2 —)• O 2 is an algebraic map, which 
preserves O 2 . Thus, the pair t(LL), 02 is also Whitney regular. Since Q acts transitively on Oi, 

Ut(LJ) = Oi. 

teG 

Therefore, the pair Oi, O 2 is Whitney regular. □ 

Consider = {[fly]i<;<m,i<;<n} as the projective space of m x n matrices. The left 

Gl{m,C) action and the right G/(n,C) action on both preserve the rank of the matrices. 

Moreover, the orbits of the total action of Gl{m,C) x Gl{n,C) are fhe mafrices wifh fixed 
rank. Therefore, the stratification by rank gives a Whitney stratification of the subvariety 
Xr := {[ay] I rank {[ay]) < r}. In particular, we have the following corollary. 

Corollary 2.4. Define X := {[fl,y]|rank (kjl) < 2} and Z := {[fly]| rank {[a y]) — 1}. Then the 
stratification Z,X\Z is a Whitney stratification of X. 

We use Whitney stratifications in Corollary 2.9. We will see, up to a sign, the ML degree of a 
singular variety is equal to the Euler characteristic with some correction terms. The correction 
terms are linear combinations of the ML degree of smaller dimensional strata of the Whitney 
stratification, whose coefficients turn out to be the Euler obstructions. 

2.2. Gaussian degrees. We have defined the notion of maximum likelihood degree of a pro¬ 
jective variety. Sometimes, it is more convenient to restrict the projective variety to some affine 
torus and consider the notion of maximum likelihood degree of a subvariety of the affine torus. 
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We call a closed irreducible subvariety of an affine torus (C*)” a very affine variety. Denote 
the coordinates of (C*)'’ by z\,z^,... ,Zn. The likelihood functions in the affine torus (C*)” are 
of the form 

; _ ~Ui U2 U„ 

lu — ^2 ^2 ■ 

Definition 2.5. Let X C (C*)” be a very affine variety. Define the maximum likelihood degree 
of X, denoted by MLdeg°(d:’), to be the number of critical points of a likelihood function 
for general Ui, U2,...,u„. 

Fix an embedding of F" —)■ by [po : pi Pn] ^ [fo ■ Pi ■■■■■ Pn ■ Po + Pi + 

• • • + p„]. Given a projective variety X C P'^, we can consider it as a subvariety of P”+i by the 
embedding we defined above. Then as a subvariety of P”+^, X is contained in the hyperplane 

Fo + Pi H- \-p„-ps = 0- 

Consider as an open subvariety of P”+^, given by the open embedding 

{zo,Zi,...,z„) hA [zo : Zi : ... : Zn : 1]. 

Now, for the projective variety X C P”, we can embed X into P”+i as described above, and 
then take the intersection with (C*)”+^. Thus, we obtain a very affine variety, which we denote 
by XT 

Lemma 2.6. The ML degree of X as a projective variety is equal to the ML degree of X° as a very affine 
variety, i.e. 

( 6 ) MLdeg(X) = MLdeg°(X‘’). 

Proof Fix general uo,Ui,... ,u„ G C. The ML degree of X is defined to be the number of 
critical points of the likelihood function (po/ps)“°(pi/ps)“^ • • • (Pn/ps)“”- The ML degree of 
X° is defined to be the number of critical points of z:q“z 2 ^ • • • z“". The two functions are equal 
on X”. Therefore, they have the same number of critical points. □ 

For the rest of this section, by maximum likelihood degree we always mean maximum 
likelihood degree of very affine varieties. 

As observed in [2], the maximum likelihood degree is equal to the Gaussian degree defined 
by Franecki and Kapranov [7]. The main theorem of [7] relates the Gaussian degree with 
Euler characteristics. In this section, we will review their main result together with the explicit 
formula from [ 6 ] to compute characteristic cycles. 

First, we follow the notation in [2]. Fix a positive integer n for the dimension of the ambient 
space of the very affine variety. Denote the affine torus (C*)” by G and denote its Lie algebra 
by g. Let T*G be the cotangent bundle of G. Thus, T*G is a 2n dimensionally ambient space 
with a canonical symplectic structure. For any 7 G g*, let C T*G be the graph of the 
corresponding left invariant 1-form on G. 

Suppose A C T*G is a Lagrangian subvariety of T*G. For a generic 7 G g*, the intersection 
A n n.y is transverse and consists of finitely many points. The number of points in A n is 
constant when 7 is contained in a nonempty Zariski open subset of g*. This number is called 
the Gaussian degree of A, and denoted by gdeg(A). 
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Let A’ C G be an irreducible closed subvariety. Denote the conormal bundle of d^eg in G by 
TxrejG/ denote its closure in T*G by T^G. Then T^G is an irreducible conic Lagrangian 
subvariety of r*G. In the language of [18], the is the likelihood correspondence. Given any 
7 G g*, the left invariant 1-form corresponding to 7 degenerates at a point P G df if and only 
if T^G n contains a point in TpG. Thus, we have the following lemma. 

Lemma 2.7 ([2]). Let X C G be an irreducible closed subvariety where G = (C*)”. Then, 

(7) MLdeg°(d:’)=gdeg(r^G). 

Let be a bounded constructible complex on G and let GG(J^) be its characteristic cycle. 
Then GG{J^) = nj ■ [Ay] is a Z-linear combination of irreducible conic Lagrangian subva¬ 

rieties in the cotangent bundle T*G. The Gaussian degree and Euler characteristic are related 
by the following theorem. 

Theorem 2.8 ([7]). With the notation above, we have 

(8) x{G,J^)= Yj ”i-gdeg(Ay). 

WjLl 

2.3. Euler obstructions. The Euler obstructions are defined to be the coefficients of some char¬ 
acteristic cycle decomposition (see equation (9)), and it is a theorem of Kashiwara that they 
can be computed as the Euler characteristic of a complex link (see Theorem 2.10). 

Let Uy^i Sj be a Whitney stratification of the very affine variety W with Si = Yreg. Let eyi be 
the Euler obstruction of the pair (Sy, d:’reg)/ which measures the singular behavior of df along 
Sj. More precisely, eyi are defined such that the following equality holds (See [ 6 , 1.1] for more 
details). 

(9) CC(CsJ = en[TYG] + Y ‘^nin^G]. 

2<i<k 

Eor example, Cn = (—1)'^™'^. 

Since t(Si) = t(G,Csj), combining eqs. (7) to (9), we have the following corollary that 
expresses t(Si) as a weighted sum of ML degrees. 

Corollary 2.9. Let Sj be a Whitney stratification of W with Si = Wreg. Then, 

X{Wreg) = eiiMLde^°{X) + Y eyi MLdeg°(Sy), 

2<i<k 

where Sj denotes the closure of Sj in W and Cji is the Euler obstruction of the pair {Sj, Si). 

Even though the abstract definition of Euler obstruction uses characteristic cycles, there is a 
concrete topological formula computing Euler obstructions due to Kashiwara. Here we recall 
that the Euler characteristic with compact support of a topological space M is defined by: 

Xc{M) = YdimHi{M,Q) 
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where H' is the i-th cohomology with compact support. When M is a rf-dimensional (possibly 
non-compact) orientable manifold, the Poincare duality of M implies that 

dimH'(M,Q) = dimH,^-'(M,Q) 

and hence Xc{M) = 

Theorem 2.10 (Kashiwara^). Under the above notations, fix a point z G Sj, and an embedding of a 
neighborhood ofz in X to an affine space W. Let V be an affine subspace ofW with dim V + dim Sj = 
dim W, which intersects Sj transversally at z. 

Then the Euler obstruction can be computed using Euler characteristic with compact support: 

(10) eyi = (b n n 

where B is a ball of radius 5 in W centered at z, (p is a general linear function on V vanishing at z and 

0 < |e| < ^ < 1. 

The affine subspace V is called a "normal slice" of Sj, and the intersection B n Yreg n (e) 
is called a "complex link" of the pair Sj, Yreg. By the formula above, every Euler obstruction we 
consider is computable. 

Example 2.11. Consider the variety of rank at most 2 matrices X in defined by 

det[p,y]3x3 = pH + pl2 H-+ p33 “ Ps = 0. 

Denote the very affine subvariety of X by X°, that is, 

X" ;= {p G X|pfy 0 for all i,] and ps f 0}. 

The Whitney stratification of X° consists of Si = X“gg, and S 2 , the singular point of X° , which 
are fhe rank 1 mafrices. By Corollary 2.9, we have 

(11) ^(X«gg) = Cii MLdeg°(X'’) + £>21 MLdeg°(S- 2 ). 

Wifh this equation, we determine the ML degree of X. The rank 1 mafrices are known fo 
have ML degree one, so MLdeg°(S 2 ) = 1. The Euler obstruction ^21 is equal to the Euler 
characteristic of the complex link as in Theorem 2.10. In this case, the complex link turns out 
to be homeomorphic to a vector bundle over P^ (see Lemma 3.2 for more details). The sign in 
front of the Euler characteristic is negative, and hence 621 = —2. 

The Euler obstruction cn is much easier to determine because this always equals (—1)‘^™^. 
So here Cn = —1. In Subsection 3.2, we will calculate the Euler characteristic of xi^reg)- l^i 
facf, T(X“gg) = —12. Therefore, (11) implies fhaf MLdeg(X) = 10 concluding the example. 

In Example 2.11, we used Corollary 2.9 and topological computations to determine the ML 
degree of a singular variety. In the next section we will again use Corollary 2.9 and topological 
computations to determine ML degrees. 


^Here the formula is in the form of [6, Theorem 1.1], see also [4, Page 100] and [9, 8.1]. 
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3. The ML degree eor rank 2 matrices 


To p*"”, we associate the coordinates pn, ■ ■ ■, Pin, ■ ■ ■, Pmi, ■ ■ ■, Pmn, Ps- Let Xmn denote the 
variety defined by pn + • • • + pmn — Ps = 0 and the vanishing of the 3x3 minors of the matrix 


( 12 ) 


Pll Pl2 ■■■ Pin 

P{m—l)l P[m—1)2 • • ' P(m—l)n 

Pml Pm2 • • • Pmn 


We think of Xmn as fhe Zariski closure of fhe set of rank 2 matrices in the distinguished 
hyperplane of P”*”. Let Zmn be the subvariety of Xmn defined by fhe vanishing of the 2x2 
minors of the matrix (12). Then Zmn is the singular locus of Xmn for m,n > 3. 

Denote the very affine varieties associated to Xmn and Zmn by and respectively. 

We will make topological computations to determine the ML degree of Xmn- Summarizing, 
according to Lemma 2.6, MLdeg(Xmn) = MLdeg°(X^„). Proposition 3.1 gives a Whitney 
decomposition of Xmn and determines the Euler obstructions for this stratification. Since being 
Whitney regular and Euler obstruction are local invariants, we obtain a Whitney stratification 
of X‘^„ and the Euler obstructions. Using these computations, we reduce our problem to 
determining a single Euler characteristic xi^mn \ ^mn) by Corollary 3.3. Next, Theorem 3.4 
provides a closed form expression of xi^mn \ ^mn)' fixed m, in terms of the elements of a 
finite sequence Am of integers. We conclude this section by computing Am for m = 3, thereby 
proving Theorem 3.12. 


3.1. Calculating Euler obstructions. We will start by proving general results for m,n >2. At 
the end of fhis section, we will specialize to the case where m = 3. With some topological 
computations, we determine Ai, A 2 of A 3 , thereby giving a closed form expression of the ML 
degree of X 3 „ [Theorem 3.12]. 

To ease notation, we let e(^mn) denote the Euler obstruction 621 of the pair {Zmn, Xmn \ Zmn}, 
which is a Whitney stratification of Xmn by Corollary 2.4, and we have 

Zmn ■— Xmn \ Zmn- 

Proposition 3.1. Denote the Euler obstruction of the pair of strata {Zmn, Zmn) by e^mn)- Then 
(13) e)mn) = (-l)'"+”~Hmin{m,n} - 1). 

Proof. Without loss of generality, we assume that m < n. When m = 2, Xmn is the distinguished 
hyperplane defined by pn + pi 2 + • • • + Pmn = Ps and Zmn C Xmn is a smooth subvariety. Thus, 
the first part of the proposition follows by item 5 of Example 2.2. Moreover, it follows from 
the definition of Euler obstruction that Cf^mn) = (—1)‘^™^'"«+^. Since dimZ„j„ = m + n — 2, the 
second part of the proposition follows. Therefore, we can assume m > 3. 

We will compute the Euler obstructions using Theorem 2.10. Since Xmn is contained in the 
distinguished hyperplane, consider Xmn as a subvariety of the projective space of all 

m X n matrices. To simplify notation, we will write X and Z instead of Xmn and Zmn- Then 
Si = X \ Z, S 2 = Z form a Whitney stratification of X according to Corollary 2.4. Next, we will 
compute the Euler obstruction e(^mn) of the pair Z, X \ Z using Theorem 2.10. 



THE MAXIMUM LIKELIHOOD DEGREE OE MIXTURES OF INDEPENDENCE MODELS 


11 


One can easily compute that dimZ = m + n — 2. Under the notation of Theorem 2.10, since 
B n (X \ Z) n (p~^{e) is a complex manifold, it is orientable and even-dimensional, and hence 

Xc{B n (X \ z) n cp-\e)) = x{B n (X \ z) n rHe)) 

by Poincare duality. Since the Euler characteristic is homotopy invariant and since = 

m — 1, the second part of the proposition follows from the following lemma where we show 
the link is homotopy equivalent to □ 

Lemma 3.2. Suppose m < n. With the above notations, we have B n (X \ Z) n is homotopy 

equivalent to 

Proof. First, we give a concrete description of fhe normal slice V. Notice that X C P"*” is 
contained in the distinguished hyperplane pn + • • ■ + pmn — Ps = 0. In this proof, we will 
consider X as a subvariety of with homogeneous coordinates pn,..., pmn- Denote the 

affine chart pu 7 ^ 0 of by Un- Let flfy = ^ ((hj) f (1/1)) Le the affine coordinates of 

Uii and lef flu = 1. Denote the origin of Un by O. Let O be the fixed point z in Theorem 2.10, 
and Un be the affine space W. 

Now, we define a projection n : Un —?• Z n Un by [ay] 1 —?• [fcfy], where by = an ■ ay and 
All = 1. Then Un becomes a vector bundle over Z n Un via re: Un ^ Z n Un. The preimage 
of O is the vector space parametrized by fl,y with 2<i<m, 2<j<n. 

In terms of mafrices, we can think of tt as the following map 


1 fll2 • • 

fl21 fl22 

ain 

^2n 

71 

■ 1 ■ 

«21 


■ 1 ■ 

fll2 

T 

1 ai2 

Q-iX ^21^12 ‘ ' 

din 

_ Ufyii (^nil 

Umn _ 

—)• 

_ ^ml _ 


_ _ 



• Clfnidfi _ 


and we think of the preimage of O as 


■ 1 

0 • 

0 ■ 


/ 

■ 1 

0 • 

• 0 ■ 


0 

«22 

am 

= 71 ^ 

0 

0 • 

• 0 

. 0 

^m2 

Umn . 


1 

. 0 

0 • 

• 0 . 

/ 


By the above construction, we can take the normal slice U at O to be the fiber re ^(O). The 
intersection U n X is clearly isomorphic to the affine variety {[fliy] 2 <!<m, 2 <;<n|rarik < 1}. Thus, 
we can define a map p : U D (X \ Z) —P'”^^ maps fhe matrix {[fl;y] 2 <Km, 2 </<n} to one 

of its nonzero column vectors, as an element in Since the rank of {[fl;y] 2 <i<m, 2 <;<n} is 

1, the map does not depend on which nonzero column vector we choose. Using basic linear 
algebra, it is straightforward to check the following two statements about p. 

• The restriction of p to B n (X \ Z) n (p^^{e) is surjective. 

• The restriction of p to B n (X \ Z) n (p^^{e) has convex fibers. 
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Thus, |0 is a fiber bundle with contractible fibers. By choosing a section, one can fiber-wise 
contract the total space of the fiber bundle to the chosen section. Hence p induces a homo- 
topy equivalence between B n (X \ Z) n and For a more general and technical 

topological statement, see the main theorem of [23]. □ 

Corollary 3.3. Let Xmn and be defined as in the beginning of this section. Then 

(14) Xi^mn) = -MLdeg(Xm„) + (-l)'"+"^i(min{?w,n} - 1). 

Proof By Example 1.5, we know that MLdeg(Zmn) = 1. One can easily compute that dim X‘^„ = 
2m + 2n — 3. Now, the corollary follows from (13) and Corollary 2.9. □ 


3.2. Calculating Euler characteristics and ML degrees. In this subsection, an expression for 
the Euler characteristic xO'mn) is given to determine formulas for ML degrees. 

Recall that X‘^„ is the complement of all the coordinate hyperplanes in X^n, and that Z^n C 
Xmn is the subvariety corresponding to rank 1 matrices of size m x n. 


Theorem 3.4. Fix m to be an integer greater than two. Then, there exists a sequence, denoted Am, of 
integers Ai, A 2 ,..., A^-i such that 


(15) 


1 


A,- 

i + l 


i \ 


for n >2. 


Before proving Theorem 15, we quote a hyperplane arrangement result, which follows im¬ 
mediately from the theorem of Orlik-Solomon (see e.g. [20, Theorem 5.90]). 

Lemma 3.5. Let Li, • • • , be distinct hyperplanes in C®. Suppose they are in general position, that is 
the intersection of any t hyperplanes from {Li, • • • , L^} has codimension t,for any 1 < t < s. Denote 
the complement 0/ Li U • • • U in C® by M. Then 

• if r = s + 1, then ;t(M) = ( — 1)®; 

• if r = s + 2, then ;\;(M) = ( —l)®(s -|- 1). 


We will use the proceeding lemma to compute Euler characteristics of stratum of the fol¬ 
lowing stratifications. Throughout, we assume that m is fixed and to simplify indices have y„ 
denoting Ymn, the rank 2 matrices with nonzero coordinates whose entries sum to 1 . In other 
words. 


3^n — \ [^ij]l<i<m,l<j<r. 


ay e C*, Y2 ‘^ij = 1/ rank[flfy] = 2 


A stratification of Yn is given by the number of columns summing to zero a matrix has. Defin¬ 
ing below. 




l<i<m 


= 3;'°) u • • • u yjf 


yields the stratification: 
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where each is a locally closed subvariety of y„. Note that by definition, is the set of 
rank 2 matrices with nonzero column sums, i.e.. 


:= <! [« 


ij\l<i<ni,l<j<n 


eyn 


Uij 7 ^ 0 for each 1 < / < n > . 

l<i<m I 


The following lemma will show by proving x{yn’) = 0 for I > 1. 


Oh - 


Lemma 3.6. The Euler characteristic of rank 2 matrices that sum to one equals the Euler characteristic 
of rank 2 matrices with nonzero column sums that sum to one. In other words, with notation as above, 

(16) x{yn)=x (yn^) and x = T (yP) =---=X = 0. 

Proof of Lemma. We define a C* action on y„ by puffing t • {aif) = a-y, where 




fly 
t X fl; 


if fliy 
if fliy 


/O 


• Ujfij 

ajnj ~ 0 - 
(0 


If is straightforward to check the action preserves each y„ ^. The action is transitive and 
continuous on yn^ for any / > 1. Therefore, x{yn^) = 0 for any Z > 1, and hence 

x{yn)=x{yir'^)- 

□ 


A column sum of a matrix in can be any element of C*. Now we consider a subsef of 

where the column sums are all exactly 1. Let V„ denote the set of rank 2 mafrices with 
column sums equal to 1, i.e.. 


\ [bij]l<i<m,l<j<n 


bij G C*, Y, ^ij = 1 for each 1 < ; < n, rank {[bij]) 

l<i<m 



In Lemma 3.7, we express the Euler characteristic of y, 


( 0 ) 


in terms of ;y( 14 )- 


Lemma 3.7. 

(17) T(3^f) = {-ir-^x{Vn). 

Proof of Lemma. Let T„ = {{tj)i<j<„ G (C*)1 Ey tj = 1}. Define a map F : T„ x V„ ^ 3^i°^ by 
putting fly = tjbij) we think of fhe jth element of T„ as scaling the jth column of V„. Clearly, 

F is an isomorphism. Therefore, xiyi^'^) = x{Tn) • T(ln)- Tn can be considered as 
removing n hyperplanes in general position. By Lemma 3.5, x{T"n) = (~1)” arid hence 

T(3^f ) = i-ir-hiVn). □ 
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A stratification for V„ can be given by the minimal /o such that the column vector [bijg] is 
linearly independent from Since YLibij = 1 for all j, if two column vectors are linearly 

dependent, they must be equal. Defining Vn^ below, 

Vn'^ ■= [[bij] e Vn\ the column vectors satisfy [bn] = [b^] = ■■■ = [&,(„_/+!)] 7^ [^•(n-/+2)]} / 
yields the stratification of I 4 by locally closed subvarieties: 

Vn = \J u u • • • u yi”). 

/=2 

Hence 

XiVn) = X {vP) +x{v!r"^) + • ■ ■+x{v!r"^) • 

We use W/ to denote the m x / rank 2 matrices with column sums equal to 1 such that the 
first two column vectors are linearly independent. Then, we have the isomorphism = W/ 
by the following map, 

[ bn bn ■■■ bn ^/(n-/+2) ••• ^i(«-/+2) bi„ ]. 

(n — / + 1) copies of bn I columns 

Therefore, 

( 18 ) X{Vn) = Xim) + xm) + • • • + Xi^n). 

For any I > 2, we can define a map Ki : Wi ^ W 2 by taking the first two column. Thus, we 
can consider all W/ as varieties over W 2 . The following lemma gives a topological description 
of W/, which will be useful to compute its Euler characteristics. 

Lemma 3.8. For any I > 2, 

W/ = W3 X W2 1^3 X W2 • • • X W2 1^3 

where there are I — 2 copies of W 3 on the right hand side and the product is the topological fiber product. 
In other words, take any point v G W 2 the fiber of tti : Wi ^ W 2 over x is equal to the (/ — 2)-th power 
of the fiber of : W 3 —)■ W 2 over x. 

Proof of Lemma. Given I — 2 elements in W 3 . Suppose they all belong to the same fiber of 
713 • kV 3 kV 2 - This means that we have (/ — 2) size m x 3 matrices of rank 2, which all have 
the same first two columns. Then we can collect the third column of each matrix, and put them 
after the same first two columns. Thus we obtain a m x I matrix, whose rank is still 2. In this 
way, we obtain a map W 3 x ^2 W 3 x ^2 • • • x ^2 W 3 —)■ W/, 

[ bn ba ba ] , [ bn ba bn], ..., [ bn ba bn ] ^ [ bn b^ ba ... bu ], 

' -' 

[I — 2) m X 3 matrices 

which is clearly an isomorphism. □ 
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Given a point x E W 2 , we let Fx denote the fiber of 713 : W 3 —>• W 2 over x. Since Euler char- 
acferistic satisfies the product formula for fiber bundles, fo compufe the Euler characteristic of 
Wj, if suffices to study the stratification of W 2 by the Euler characteristic of Fx, and compufe 
the Euler characteristic of each stratum. More precisely, let 

:= {x G mWr) = -k}, so W 2 = □ 

kez. 

Then, to compute d suffices fo compufe for k E'Z.. According to Lemma 3.9, 

it suffices to consider k E {0,1,...,m — 1}. Eor this reason, 

let Am denote the sequence Aq, Ai, ..., A^-i, where A)t are defined by 

(19) Ak ■= xiy^^'^) for Q <k < m — 1. 

As we will soon see in the proofs, we can consider Fx as the complement of some point 
arrangement in C. The arrangement is parametrized by the point x G W 2 . In other words, W 2 is 
naturally a parameter space of point arrangement in C. According to the Euler characteristic of 
the corresponding point arrangement, W 2 is canonically stratified. Our problem is to compute 
the Euler characteristic of each sfratum. The main difficulty to generalize our method to 
compute the ML degree of higher rank matrices is to solve the corresponding problem for 
higher dimensional hyperplane arrangements. 

Lemma 3.9. For any x E W 2 , 

0 > x{Fx) > -{m - 1 ). 

Moreover, the map W 2 —?■ Z defined by x i-E- xi^x) is a semi-continuous function. In other words, 
for any integer k, {x E V^ilxi^x) > —k} is a closed algebraic subset 0/ W 2 . In particular, the subsets 

W® = {x G W 2 |;t(F;,) = -k}, 0 < k < {m — 1), give a stratification of W 2 into locally closed 
subsets. 


Proof of Lemma. By definition. 


1^2 — [bij]l<i<m,j=l,2 | f’y ^ c*. E E N = i , rank ( [by] ) = 2 

I l<i<m l<i<m , 


Eix an element x = [bn b^] G W2. By definition, the fiber Fx of 713 : W3 —> W2 is equal to the 
following: 


Fx 


[ba] e (C*)'” 


I] ba - 1/ [b^] is confained in the linear span of [bn] and [b^] 

l<i<m 


Since the columns [bn], [ba]/ [ba] each sum to one, for any [ba] E Fx fhere exists T G C 
such that [bn] = T ■ [bn] + (1 — T) • [bn]. Eor each i, this means bn A 0 is equivalent to 
bi 2 + T • {bn - ba) f- 0. Therefore, for x = [bn ba], 

Fx ^ [t ec\t ^-^JIbn-ba^0Iori = l,2,...,m} 

C \ { — 11 < / < m such that bn — ba 0^ ■ 


( 20 ) 
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Notice that [bn] ^ [bn]- Therefore, there has to be some i such that bn ^ ba- Thus is 
isomorphic to C minus some points of cardinality between 1 and m, and hence the first part of 
the lemma follows. 

The condition that xi^x) > is equivalent to the condition of some number of equalities 
bn = bi2 and some number of overlaps among • Those conditions can be expressed by 
algebraic equations. Thus, the locus of x such that xi^x) > r is a closed algebraic subset in 
W2. in 

Example 3.10. When m = 3 , the Euler characteristic xi^x) is in { 0 , — 1 , — 2 }. For example if 


Xo = 

1 

2 

1 

3 

, Xl = 

2 

1 

_1 

, X2 = 

4 

-2 

5 

7 


-2 

-3 


-2 

1 


-1 

-11 


then x{Fxo) = 0 ,x{Fxi) = -IrXi^Xi) = - 2 . This is because = C\ { 3 }, = C \ {-5, 5}, 

and Fx2 = C \ { 5 , — More generally, every matrix v G W2 has xi^x) > — 2 . To have 

X G W2 such that xi^x) > — 1 , at least one of the following six polynomials must vanish: 


(^31 — b ^ l ), (^21 — ^ 22 )/ ( 1^11 — 1 ’ 12 )/ 


det 

foil foi2 

, det 

1 

CM 

T-H 

^ . 

, det 

1- 

CM 

CM 

CM 


fo2i fo22 


[ fosi fo 32 J 


[ fosi fo 32 J 


These conditions induce six irreducible components in W2 whose points satisfy xi^x) > ~ 1 - To 
have X G W2 such that xi^x) > 0 , two of the the six polynomials above must vanish. However, 
this does not induce 15 = (2) components in W2. This is because in twelve of these cases there 
are no points in W2 that satisfy the conditions. Instead, there are only three components. 

Proof of Theorem 3.4 . Thus far, we have shown with eqs. ( 16 ) to ( 18 ), the following relations: 


(21) x{yn)== (-i)'^-i xm- 

2<l<n 

Recall that = {^ ^ ^2\x{Fx) = —k}, and by Lemma 3.9 we have the stratification, 

( 22 ) W2 = U W2^^^ U • • • U W2^’”^^^ 

(k) 

Moreover, W2 ^ are locally closed algebraic subsets of W2. The projection 713 : W3 W2 induces 
a fiber bundle over each and by definition of W2^^, the fiber has Euler characteristic —k. 
With Lemma 3 . 8 , we showed W/ is isomorphic to an (/ — 2 ) fiber product of W3 over 
Thus, restricting tti : Wi ^ W2 to W^^^, the induced bundle's fiber has Euler characteristic 
Then, 


( 23 ) 


X{n^ =h-{-ky 
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Since are locally closed algebraic subsets of W 2 , ^(W 2 '‘^) are locally closed algebraic 

subsets of W;. Therefore, the additivity of Euler characteristic implies the following, 

(24) xm= E T(7rE(wf))= E 

0<k<m—l 0<k<m — l 

Here our convention is 0*^ = 1. 

By equations (21), (24) and Proposition 3.11, we have the following equalities. 

l-(-fc)”-i 


{fc)^ 


\n—1 


vc,) = (-i)"-‘E L Aj.(-(r)'-q = (-!)”-' E -'t 

2<l<n \o<k<m—l ) 0<k<m—l 


l-(-fc) • 


The last line becomes the same as (15) by replacing k by i and showing Aq = 0 in Proposi¬ 
tion 3.11. □ 


Proposition 3.11. Let Aq of Am be defined as above, then Aq = 0. 

We divert the proof of Proposition 3.11 to the end of the section. Now, we specify to the case 
m = 3 for our first main result. 


Theorem 3.12. [Main Result] The maximum likelihood degree of is given by the following for¬ 
mula. 

(25) MLdeg(X 3 „) = 2”+^ - 6. 

Proof. In [14], the ML degree of X 32 and ML degree of X 33 are determined to be 1 and 10 
respectively. With this information it follows Ai -|- A 2 = 0 and A 2 = 12 by Theorem 3.4. □ 


The take away is that finitely many computations can determine infinitely many ML degrees. 
Using these techniques we may be able to determine ML degrees of other varieties, such as 
symmetric matrices and Grassmannians, with a combination of applied algebraic geometry 
and topological arguments. 


Proof of Proposition 3.11. Recall that Aq = a(^ 2 °^)- ^7 definition, W 2 °^ consists of those [bn b^] 
in W 2 such that the cardinality of the set {bn/b^ll <i<ni, bn f b^} is equal to 1. 

Notice that for [bn b^] G 

(26) E E = 

l<i<m l<i<m 

hii^ha bi2^bi2 

Therefore, we can define a C* action on the set of m by 2 matrices {[bij]i<i<m,j=i, 2 } by setting 
t ■ (bij) = {b[j), where 


(27) 



if bn = bi2 
otherwise. 


Now, it is straightforward to check this C* action preserves W 2 *^^ and the action is transitive on 
W 2 *^^ Recall that when an algebraic variety M admits a C* action, x{^) = where M^* 

is the fixed locus of the action (see [6, Proposition 1.2]). Therefore, = a(®) =0. □ 
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4. Recursions and closed form expressions 

In this section, we use Theorem 3.4 to give recursions for the Euler characteristic xO^mn) 
thus the ML degree of Xmn by (14). We break the recursions and give closed form expressions 
in Corollary 4.2. 


4.1. The recurrence. Recall fhaf we have := \ By Theorem 3.4, giving a recur¬ 
sion for xi^mn) is equivalent to giving a recursion for — MLdeg(Xmn) + { — — 1), 

assuming m < n. The next theorem gives the recursion for xO^mn)- 

Theorem 4.1. Fix m and let —Ci be the coefficient oft™^^ in {t + 1) — ^)- For n > m,we have 

X{ymn) = OlX{y°m(n-l)) + C2X(^m(„-2)) + ' ’ ' + OmX{y°m{n-m))■ 


Proof By Theorem 3.4, we have 


vv™) = (-1)"-' E 

—1 


A,- 

i + 1 


L 

—1 


A; 

i + 1 


1 


for n > 2. Therefore T(^mn) is ari order m linear homogeneous recurrence relation with 
constant coefficients. The coefficients of such a recurrence are described by fhe characteristic 
polynomial with roots —1, 1, ..., ra — 1, i.e. P — — ■■■ — Cm = {t + 1) ~ ^)- ^ 


With these recurrences we determine the following table of ML degrees: 


n = 

ra = 2 

m = 3 

m = 4 

ra = 5 

m = 6 

ra = 7 

m : 

1 

10 

191 

6776 

378477 

30305766 

m + 1 

1 

26 

843 

40924 

2865245 

274740990 

m + 2 

1 

58 

3119 

212936 

19177197 

2244706374 

m + 3 

1 

122 

10587 

1015564 

118430045 

17048729886 

ra -h 4 

1 

250 

34271 

4586456 

692277357 

122818757286 

ra -h 5 

1 

506 

107883 

19984444 

3892815965 

850742384190 

ra -h 6 

1 

1018 

333839 

84986216 

21284701677 

5720543812614 


Table 1. ML degrees of Xmn 


Nofe that our methods are not limited to Table 4.1. We give closed form formulas in the 
next section for infinite families of mixture models. 

4.2. Closed form expressions. In this subsection we provide additional closed form expres¬ 
sions. Using an inductive procedure (described in the proof of Corollary 4.2), we defermine 
Am for small m that can be extended to arbitrary m? 

See the appendix for an implementation. 
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Corollary 4.2. For fixed m = 2,3,... 7, the closed form expressions for MLdeg(Xmn) with m <n are 
below: 


MLdeg(X 2 „) 

= 1 

MLdeg(X 3 „) 


MLdeg(X 4 „) 

_ ( ^ . 2 ^“^ -j- — • 3 ^” 

MLdeg(X 5 „) 

_ /-I 8 O AU-l 1 780 on -1 1 -1080 

“ 12 -*^ “^ 3 ^ “^4 

MLdeg(X 6 „) 

_ / 602 ^n -1 1 -4200 ofi-l j 10080 

MLdeg(X 7 „) 

_ /-1932 1 -20412 ott -1 | - 


-1 _|_ 480.4n-lj 
-1 _|_ - 10080 . 411 - 


3^0 , ^n- 


-75600 on—l i 127680 au- 

— -5“ 


-1 + 


■) 


-100800 c«- 

- 5 -^ 




30240 . gn-lj 


Proof We find these formulas using an inductive procedure to determine from A^-i. 
With equations (14) and (15), we have the following m — 1 relations with n = 2,3,... ,m: 


r MLdeg(X„ 2 ) 1 

MLdeg(X„ 3 ) 

+ (-1)- 

1 

-2 

= M 

r l /2 1 

A 2/3 

, where 

. MLdeg{X„„) . 


- (-ir(m-i) . 


. ^m—l/m - 



( 

p 

2^ 

(m — 1)^ 


(-1)' 

(-U ■■ 

(-1)' 

\ 

u 

22 

(m —1)2 


(-1)" 

(-if .. 

(-1)' 


V 

1 

2^—1 

• (m-l)'"-i 



(-ir' .. 


/ 


For fixed m, this system of linear equations has 2m — 2 unknowns: MLdeg(Xm;) for j = 2,... ,m 
and Ai,..., A^-i of Am. By induction, we may assume we know A^-i. The A^-i gives us a 
closed form expression for the ML degrees of X(^m-i)i with j > 2. Since MLdeg(X(„j_i)y) = 
MLdeg(Xy(„j_i)), we have reduced our system of linear equations to m unknowns by substitu¬ 
tion. By Proposition 4.4, we have A^-i of Am equals (m — l) - ml. Substituting this value as well, 
we have a linear system of m — 1 equations in m — 1 unknowns: MLdeg(X„j„j), Ai, A 2 ,..., Am- 2 - 
A linear algebra argument shows that there exists a unique solution of the system yield¬ 
ing each Ay of Am as well as MLdeg(Xmm)- It proceeds as follows. Since the unknown 
MLdeg(Xmm) appears only in the last equation, the linear system above has a unique so¬ 
lution if and only if the determinant of the (ra — 2) x (m — 2) upper left submatrix of M, 
denoted N, is nonsingular. Consider the row vector c := [ci,C 2 ,... ,c„j_ 2 ]/ and let fc{x) = 
c^x^ + C 2 X^ -h • • • -h 0 ^- 2 ^”* We will show that c is a null vector for N if and only if c is the 
zero vector. If c is a null vector, then, by multiplying c with N we see 

[/c(l)-/c(-l),/c(2)-/c(-l),...,/c(m-2)-/c(-l)]^ = 0. 

In other words, {fc{x) ~ /c(~l)) has m — l distinct roots and must be the zero polynomial. 
This means fc{x) is a constant, and c = 0. 

Using the inductive procedure described above, we determined the following table of Am to 
yield the closed form expressions we desired. □ 

Remark 4.3. With our recursive methods we determined the ML degree of X 2 o ,20 to be 

19 674 198 689 452 133 729 973 092 792 823 813 947 695 « 1.967 x 10^°. 
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Ai 

A2 

'^3 

A4 

A5 Ag 

Ai 

2 






-12 

12 




A4 

50 

-120 

72 



^5 

-180 

780 

-1080 

480 


A6 

602 

-4200 

10080 - 

10080 

3600 

A ? 

-1932 

20412 

-75600 127680 

-100800 30240 



Table 2. The A, 

of . 



While it is currently infeasible to track this large number of paths via numerical homotopy 
continuation as in [13], it may be possible to track much fewer paths using the topology of the 
variety Preliminary results by the 2016 Mathematics Research Community in Algebraic Statistics 
Likelihood Geometry Group for weighted independence models deform a variety with ML degree 
one to a variety with large ML degree. This deformation deforms the maximum likelihood 
estimate of one model fo the estimate of anofher. The stratifications presented here may lead 
to similar results in the case of mixtures. 


Proposition 4.4. Let Ajt of Am be defined as in (19). Then A^-i of Am equals {m — 1) ■ ml. 


Proof Recall that A^-i of Am equals definition, consists of all [bn bi 2 \ G 

W 2 such that bn 7 ^ bn for all 1 < i < m and bn/bn are distinct for 1 < i < m. 

Denote by Bm the subset of (C* \ {1})’” corresponding to m distinct numbers. Then, there 

is a natural map n : Bm, defined by [by] 1 —?■ The map is surjective. 

Moreover, one can easily check that under the map tt, W) ^ is a fiber bundle over Bm- In 
addition, the fiber is isomorphic to the complement of m hyperplanes in in general 

position. By Lemma 3.5, the fiber has Euler characteristic ( —l)'”^^(m — 1 ). 

The Euler characteristic of Bm is equal to (—I)"* • m! by induction. In fact, Bm is a fiber 
bundle over Bm-i with fiber homeomorphic to C* \ {m distinct points}. Therefore, 

X{^ 2 ^ = (—— 1) • ( —l)'"ra! = (ra — 1) • ml. 


□ 


5. Conclusion and additional questions 

We have developed the topological tools to determine the ML degree of singular models. We 
proved a closed form expression for 3 x n matrices with rank 2 conjectured by [13]. In addition, 
our results provide a recursion to determine the ML degree of a mixture of independence 
models where the first random variable has m states and the second random variable has n 
states. Furthermore, we have shown how a combination of computational algebra calculations 
and topological arguments can determine an infinite family of ML degrees. 

The next natural question is to determine the ML degree for higher order mixtures (rank r 
matrices for r > 2). Our results give closed form expressions in the corank 1 case by maximum 
likelihood duality [5]. Maximum likelihood duality is quite surprising here because our methods 
would initially suggest that the corank 1 matrices have a much more complicated ML degree. 
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it would be very nice to give a topological proof in terms of Euler characteristics of maximum 
likelihood duality. One possible approach is by applying these techniques to the dual maximum 
likelihood estimation problem described in [22]. 

Another question concerns the boundary components of statistical models as described in [19] 
for higher order mixtures. Can we also use these topological methods to give closed form 
expressions of the ML degrees of the boundary components of the statistical model? 

Finally, one should notice that the formulas in Corollary 4.2 for the ML degrees involve 
alternating signs. It would be great to give a canonical transformation of our alternating sum 
formula into a positive sum formula. One reason why this might be possible is motivated by 
the work of [12]. Here, entries of the data u are degenerated to zero and some of the critical 
points go to the boundary of the algebraic torus (i.e. coordinates go to zero). Based on how the 
critical points go to the boundary, one partitions the ML degree into a sum of positive integers. 
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Appendix 

The following code creates a function called recursiveMLDegrees that will compute a table 
of ML degrees and other Euler characteristics. The input consists of two integers m,n with 
m < n. The output is a sequence of four elements. 

• Element 0 of the output is the ML degree of X^n- 

• Element 1 of the output is the list Am = M, ^ 2 , 

• Element 2 of the output is a table of ML degrees given by a list of lists. 

• Element 3 of the output lists A, for i = 2,3,... ,m. 

For example, recursiveMLDegrees(3,4) returns 

(26, {-12, 12}, {{1, 1, 1}, {10, 26}}, {{2}, {-12, 12}}) 

and (recursiveMLDegrees(7,n))_3 returns Table 4.2 for n >7. Here is the code. 

-MACAULAY2 CODE- 

recursiveMLDegrees=(M,N)->(mlR=QQ[a_0..a_N]**QQ[c_0..c_N] **QQ[ML_0..ML_N] ; 

-mlEquations is a function that gives the relations in the proof of Corollary 4.2. 

mlEquations=(m,n,mldegX)->( (-mldegX+(-l)~(m+n-l)*(min(m,n)-1))- 

sum for i from 1 to m-1 list (a_i*( (-1)~(n-1)-i~(n-1))/(i+1))); 

-M2N lists MLDegree(X_2,j) for j=2,3,...,N; These ML degrees are 1. 

M2N:=for i from 2 to N list 1; 

-tableML gives a list of lists of ML degrees. 

-tableLam gives, for i=2,3,...,m, the lists \Lambda_i={\lamda_l,\lamda_2,..,\lamda_{i-l}. 

tableML:={M2N};tableLam:={{2}}; 

-The loop constructs \Lamda_i and ML degrees using the recursion in the proof of Cor. 4.2. 

for fixM from 3 to M do ( 

mldList:=(for i from 2 to fixM-1 list tableML_(i-2)_(fixM-i))I{ML_fixM}; 
solvel:=radical ideal({a_(fixM-l)-fixM!*(fixM-1)}I 

for fixN from 2 to fixM list mlEquations(fixM,fixN,mldList_(fixN-2))); 

newLam:=for j from 1 to fixM-1 list a_j "/solvel; 
newLamSub:=for j from 1 to fixM-1 list a_j=>newLam_(j-l); 
tableLam=append(tableLam,(toList newLamSub)/last); 

newMLdegrees:=for i from fixM to N list (sum for j from 1 to N list 
if j>= fixM then 0 else sub(a_j/(j+l)*j~(i-l), a_j=>newLam_(j-1))); 
tableML=append(tableML,newMLdegrees)); 

-The output of the function has four elements. 

return (last last tableML,last tableLam, tableML,tableLam)) 

-EXAMPLE- 

INPUT: recursiveMLDegrees(3,4) 

OUTPUT: (26, {-12, 12}, {{1, 1, 1}, {10, 26}}, {{2}, {-12, 12}}) 

-EXAMPLE- 

INPUT: (recursiveMLDegrees(3,4))_0 
OUTPUT: 26 
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